Non-negative matrix factorization using Tensorflow (tf35)

contributed by Nipun Batra

Perform NNMF using TensorFlow using matrix with missing entries (to mimic the problem of movie recommendation). Projected gradient descent will be used for this problem. We would compute the gradient, then ensure that the weights are non-negative, then perform gradient descent...



In [20]:

    
# Customary imports

import tensorflow as tf
import numpy as np
import pandas as pd
np.random.seed(0)



In [7]:

    
# Creating the matrix to be decomposed

A_orig = np.array([[3, 4, 5, 2],
                   [4, 4, 3, 3],
                   [5, 5, 4, 4]], dtype=np.float32).T

A_orig_df = pd.DataFrame(A_orig)



In [8]:

    
A_orig_df #(4 users, 3 movies)



In [21]:

    
# Masking some entries

A_df_masked = A_orig_df.copy()
A_df_masked.iloc[0,0]=np.NAN
np_mask = A_df_masked.notnull()
np_mask









    Out[21]:






  
    
      
      0
      1
      2
    
  
  
    
      0
      False
      True
      True
    
    
      1
      True
      True
      True
    
    
      2
      True
      True
      True
    
    
      3
      True
      True
      True

Basic Tensorflow setup



In [11]:

    
# Boolean mask for computing cost only on valid (not missing) entries
tf_mask = tf.Variable(np_mask.values)

A = tf.constant(A_df_masked.values)
shape = A_df_masked.values.shape

#latent factors
rank = 3 

# Initializing random H and W
temp_H = np.random.randn(rank, shape[1]).astype(np.float32)
temp_H = np.divide(temp_H, temp_H.max())

temp_W = np.random.randn(shape[0], rank).astype(np.float32)
temp_W = np.divide(temp_W, temp_W.max())

H =  tf.Variable(temp_H)
W = tf.Variable(temp_W)
WH = tf.matmul(W, H)

Cost function



In [12]:

    
#cost of Frobenius norm
cost = tf.reduce_sum(tf.pow(tf.boolean_mask(A, tf_mask) - tf.boolean_mask(WH, tf_mask), 2))

Misc. Tensorflow



In [13]:

    
# Learning rate
lr = 0.001
# Number of steps
steps = 1000
train_step = tf.train.GradientDescentOptimizer(lr).minimize(cost)
init = tf.global_variables_initializer()

Ensuring non-negativity



In [14]:

    
# Clipping operation. This ensure that W and H learnt are non-negative
clip_W = W.assign(tf.maximum(tf.zeros_like(W), W))
clip_H = H.assign(tf.maximum(tf.zeros_like(H), H))
clip = tf.group(clip_W, clip_H)

Main Tensorflow routine



In [15]:

    
steps = 1000
with tf.Session() as sess:
    sess.run(init)
    for i in range(steps):
        sess.run(train_step)
        sess.run(clip)
        if i%100==0:
            print("\nCost: %f" % sess.run(cost))
            print("*"*40)
    learnt_W = sess.run(W)
    learnt_H = sess.run(H)









    



Cost: 148.859848
****************************************

Cost: 3.930172
****************************************

Cost: 2.068570
****************************************

Cost: 1.418309
****************************************

Cost: 0.819721
****************************************

Cost: 0.399933
****************************************

Cost: 0.176080
****************************************

Cost: 0.079007
****************************************

Cost: 0.041353
****************************************

Cost: 0.027041
****************************************

Computing the prediction



In [16]:

    
learnt_H









    Out[16]:





array([[ 0.86129224,  1.3388027 ,  1.97224879],
       [ 2.16338873,  0.97277433,  1.17212451],
       [ 0.25879648,  1.07861733,  1.09541821]], dtype=float32)



In [17]:

    
learnt_W









    Out[17]:





array([[ 1.15797794,  0.97454673,  1.41825044],
       [ 1.44136858,  1.16967547,  0.79135358],
       [ 0.81640321,  1.98227394,  0.02636297],
       [ 1.38819814,  0.29285902,  0.8031919 ]], dtype=float32)



In [18]:

    
pred = np.dot(learnt_W, learnt_H)
pred_df = pd.DataFrame(pred)
pred_df.round()

Compare with the Original



In [19]:

    
A_orig_df



In [ ]:

	0	1	2
0	3.0	4.0	5.0
1	4.0	4.0	5.0
2	5.0	3.0	4.0
3	2.0	3.0	4.0

	0	1	2
0	3.0	4.0	5.0
1	4.0	4.0	5.0
2	5.0	3.0	4.0
3	2.0	3.0	4.0

	0	1	2
0	3.0	4.0	5.0
1	4.0	4.0	5.0
2	5.0	3.0	4.0
3	2.0	3.0	4.0